Primary exercises

Use \({\small \;\;{\%}{>}{\%}\;\;} \) as much as possible in your solutions.

In the survey dataset:

  1. Right-handedness
  • How many individuals are right handed?
survey %>%  filter(hand=="right") %>% nrow()
[1] 216
  • Are there more right-handed males than females?
# right-handed females
rh_females <- survey %>%  
              filter(hand=="right" & gender=="female")  %>% 
              nrow() 
# right-handed males
rh_males <- survey %>%  
            filter(hand=="right" & gender=="male") %>% 
            nrow()

# Test whether there are strictly more right-handed males than females. 
#
# FALSE => the answer is no 
# TRUE => answer is yes
rh_males > rh_females  
[1] FALSE
  1. Count the number of females who never smoke. Do the same for males.
female_smokers <- survey %>% filter(smokes=="never" & gender=="female") %>% nrow()
female_smokers
[1] 98
male_smokers <- survey %>% filter(smokes=="never" & gender=="male") %>%  nrow()
male_smokers
[1] 88
  1. Produce the percentages of female and male non-smokers.
tot_females <- survey %>% filter(gender=="female") %>%  nrow() # total females
tot_males <- survey %>% filter(gender=="male") %>%  nrow()     # total males

# Take female_smokers and male_smokers from previous exercise
(female_smokers/tot_females)*100 # % female smokers
[1] 83.76068
(male_smokers/tot_males)*100     # % male smokers
[1] 75.86207
  1. The variables pulse, height and m.i have missing values (NA).
  • Count for each variable number of missing (NA). Use the function is.na (ref) as a condition to find the missing values.
pulse_NAs <- survey %>% filter(is.na(pulse)) %>%  nrow()   # nr. of NA's pulse variable 
pulse_NAs
[1] 45
height_NAs <- survey %>% filter(is.na(height)) %>%  nrow()  # nr. of NA's height variable 
height_NAs
[1] 27
mi_NAs <- survey %>% filter(is.na(m.i)) %>%  nrow()     # nr. of NA's m.i variable 
mi_NAs
[1] 27
  • The variables height and m.i have the same amount of missing (NA), is it the case that these missing values are in the same observations (rows), i.e. that if height is missing then also m.i is missing at the same row?
# From the first part of this exercise we know that 27 observations are missing (NA) 
# in 'height' and 'm.i'. We only need to count how many observations in survey are 
# missing height and m.i at the same time. If that is equal to 27 then the answer is 
# yes.

# number of observation for which we have height=m.i=NA
( survey %>% filter(is.na(height) & is.na(m.i))  %>% nrow() ) == height_NAs
[1] TRUE

Extra exercises

  1. What is the maximum female height in inch units from the survey dataset?
# The 'height' variable has missing values, i.e. 'NA'. Most functions in R, e.g. sum, 
# max, min, median, mean and so on,  when operating on vectors with missing values 
# produce a NA, and rightly so. However, the functions are equipped with an additional 
# argument 'na.rm' if you choose to run the function on the non-missing part of the 
# vector. Beware that max(weig) 

survey %>% filter(gender=="female") %>%  
          mutate(height_inch=height * 0.393701) %>% 
          pull(height_inch) %>%  max(na.rm = TRUE)
[1] 70.07878
  1. What is right-handedness percentage in females who frequently exercise? What about males?
# 1) First we create a tibble 'female_exercise_freq' for females with exercise=="freq"
female_exercise_freq <- survey %>% 
  filter(gender=="female" & exercise=="freq") 

# 2) From the group in 'female_exercise_freq' calculate the total ('female_freq_count') 
# and the nr. of right-handed ones ('female_freq_rh_count').
female_freq_count <- female_exercise_freq %>% 
  nrow()  # calculate total nr. of females who frequently exercise
female_freq_rh_count <- female_exercise_freq %>% 
  filter(hand=="right") %>% nrow() #  nr. of right-handed ones

# 3) calculate the fraction and the percentage.
(female_freq_rh_count / female_freq_count)*100  # % 
[1] 93.75


Copyright © 2021 Biomedical Data Sciences (BDS) | LUMC